On Index Policies for Restless Bandit Problems
نویسندگان
چکیده
In this paper, we consider the restless bandit problem, which is one of the most well-studied generalizations of the celebrated stochastic multi-armed bandit problem in decision theory. In its ultimate generality, the restless bandit problem is known to be PSPACE-Hard to approximate to any non-trivial factor, and little progress has been made on this problem despite its significance in modeling activity allocation under uncertainty. We make progress on this fundamental problem by showing that for an interesting and general subclass that we term RECOVERING bandits, a surprisingly simple and intuitive greedy policy yields a factor 2 approximation. Such greedy policies are termed index policies, and are popular due to their simplicity and their optimality for the stochastic multi-armed bandit problem. We develop several novel techniques in the design and analysis of the index policy. Our algorithm proceeds by modifying the constraints to the dual of a well-known LP relaxation to the restless bandit problem. This is followed by a structural characterization of the optimal solution by using both the exact primal as well as dual complementary slackness conditions. This yields an interpretation of the dual variables as potential functions from which we derive the index policy and the associated analysis. We believe such techniques will be of independent interest in related stochastic scheduling problems. The RECOVERING bandit problem strictly generalizes the stochastic multi-armed bandit problem, and naturally models multi-project scheduling where the state of a project becomes increasingly uncertain when the project is not scheduled. Such a notion is similar to (but not quite the same as) the notion of a Partially Observable Markov Decision Process (POMDP). ∗Department of Computer and Information Sciences, University of Pennsylvania, Philadelphia PA 19104-6389. Email: [email protected]. Research supported in part by an Alfred P. Sloan Research Fellowship, an NSF CAREER Award, and NSF Award CCF-0644119. †Department of Computer Science, Duke University, Durham NC 27708-0129. Email: [email protected]. Research supported in part by NSF award CNS-0540347. ‡Duke University, Durham NC 27708. Email: [email protected]. This research was supported by the Duke University Work-Study Program and by NSF award CNS-0540347. ar X iv :0 71 1. 38 61 v1 [ cs .D S] 2 5 N ov 2 00 7
منابع مشابه
Marginal productivity index policies for scheduling restless bandits with switching penalties
We address the dynamic scheduling problem for discrete-state restless bandits, where sequence-independent setup penalties (costs or delays) are incurred when starting work on a project. We reformulate such problems as restless bandit problems without setup penalties, and then deploy the theory of marginal productivity indices (MPIs) and partial conservation laws (PCLs) we have introduced and de...
متن کاملIndex Policies for a Class of Discounted Restless Bandits
The paper concerns a class of discounted restless bandit problems which possess an indexability property. Conservation laws yield an expression for the reward suboptimality of a general policy. These results are utilised to study the closeness to optimality of an index policy for a special class of simple and natural dual speed restless bandits for which indexability is guaranteed. The strong p...
متن کاملRestless Bandit Marginal Productivity Indices, Diminishing Returns, and Optimal Control of Make-to-Order/Make-to-Stock M/G/1 Queues
This paper presents a framework grounded on convex optimization and economics ideas to solve by index policies problems of optimal dynamic allocation of effort to a discrete-state (finite or countable) binary-action (work/rest) semi-Markov restless bandit project, elucidating issues raised by previous work. Its contributions include: (i) the concept of a restless bandit’s marginal productivity ...
متن کاملDynamic priority allocation via restless bandit marginal productivity indices
This paper surveys recent work by the author on the theoretical and algorithmic aspects of restless bandit indexation as well as on its application to a variety of problems involving the dynamic allocation of priority to multiple stochastic projects. The main aim is to present ideas and methods in an accessible form that can be of use to researchers addressing problems of such a kind. Besides b...
متن کاملIndexability of Restless Bandit Problems and Optimality of Index Policies for Dynamic Multichannel Access
We consider an opportunistic communication system consisting of multiple independent channels with time-varying states. With limited sensing, a user can only sense and access a subset of channels and accrue rewards determined by the states of the sensed channels. We formulate the problem of optimal sequential channel selection as a restless multi-armed bandit process. We establish the indexabil...
متن کامل